The objective of this project is to create a Convolutional Neural Network that can accurately identify seedlings presented to it in the form of image files.
The project will use a dataset from Kaggle that has been condensed down to a reasonable size for class purposes by the academic staff.
Since the data will be image data, there is no data dictionary for this project. The images and truth labels will be imported and processed.
Import the necessary libraries:
import numpy as np # arrays
import pandas as pd # dataframes for displaying results tables
import matplotlib.pyplot as plt # basic plotting
import seaborn as sns # fancy styling
import cv2 # image manipulation
import tensorflow as tf # neural nets
import os # operating system functions
import gc
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)
sns.set()
sns.set_style('whitegrid')
# Set the random seed for reproducibility
seed = 1
np.random.seed(seed)
tf.random.set_seed(seed)
Check GPU compute is enabled. Tensorflow can see the GPU and use it.
tf.config.list_physical_devices('GPU')
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Load the images from the provided files. For this project it's more practical to keep the data in numpy arrays instead of pandas dataframes as the data is not tabular in nature.
images = np.load('images.npy')
classes = np.loadtxt('labels.csv', dtype=str, delimiter=',', skiprows=1)
text_classes = classes
print('images shape: ' + str(images.shape))
print('classes shape: ' + str(classes.shape))
images shape: (4750, 128, 128, 3) classes shape: (4750,)
We have 4750 color images with a resolution of 128x128.
Store the unique labels in the dataset.
unique_classes = np.unique(classes)
print(unique_classes)
['Black-grass' 'Charlock' 'Cleavers' 'Common Chickweed' 'Common wheat' 'Fat Hen' 'Loose Silky-bent' 'Maize' 'Scentless Mayweed' 'Shepherds Purse' 'Small-flowered Cranesbill' 'Sugar beet']
Store the starting and ending index of the first image in each class for evaluation later.
class_indices = []
for cl in unique_classes:
first = np.where(classes == cl)[0][0]
last = np.where(classes == cl)[0][-1]
class_indices.append((first, last))
class_indices
[(3833, 4095), (2034, 2423), (2424, 2710), (1423, 2033), (1202, 1422), (496, 970), (4096, 4749), (3612, 3832), (2711, 3226), (971, 1201), (0, 495), (3227, 3611)]
Class balance assessment:
plt.figure(figsize=(12,6))
plt.title("Class Balance of Seedlings", fontsize=16)
plt.bar(np.unique(classes, return_counts=True)[0], np.unique(classes, return_counts=True)[1])
plt.xticks(rotation=90, fontsize=16)
plt.show()
The classes are certainly imbalanced but there are a decent amount of samples from each class. I will try to build a model without oversampling, and if that doesn't go well, try to oversample.
Create a function to print the first image from each class so that we can evaluate image processing as it proceeds.
def print_firsts(image_array, class_array):
plt.figure(figsize=(20,15))
for i, n in enumerate(class_indices):
plt.subplot(3, 4, i + 1)
ax = plt.gca()
ax.axes.xaxis.set_ticklabels([])
ax.axes.yaxis.set_ticklabels([])
plt.grid(b=None)
plt.imshow(image_array[n[0]])
plt.xlabel(class_array[n[0]], fontsize=16)
plt.show()
Print an image frome each label prior to processing, to make sure the data appears loaded correctly.
print_firsts(images, text_classes)
mean_images = []
plt.figure(figsize=(20,15))
for i, block in enumerate(class_indices):
mean_image = np.mean(images[block[0]:block[1]] / 255, axis=0)
mean_images.append(mean_image)
plt.subplot(3, 4, i + 1)
ax = plt.gca()
ax.axes.xaxis.set_ticklabels([])
ax.axes.yaxis.set_ticklabels([])
plt.grid(b=None)
plt.imshow(mean_images[i])
plt.xlabel(unique_classes[i], fontsize=16)
plt.show()
The average image doesn't have a whole lot of insights, although some are donut shaped (opposing leaves) and others are solid center green, whereas others you can barely see any green at all (grasses)
The images and labels appear to have a logical relationship. Similar seedlings have similar labels, so I'm confident the data has been put together properly.
There is a class imbalance, but no class looks to have so few samples that it can't be worked around. Resampling may become useful as well.
Since all the seedlings we're after are green, there are some opportunities to use image masking to help the models out.
This function can be called on any model I create to display it's accuracy and loss per epoch, along with a confusion matrix and the typical classification metrics.
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score, roc_curve
from sklearn.metrics import confusion_matrix
def show_metrics(model, x_train, x_test, y_train, y_test, history):
# Make predictions with the model
y_hat_train = model.predict(x_train).argmax(axis=1)
y_hat = model.predict(x_test).argmax(axis=1)
# Convert the Test data to the same format as the predictions
y_train = y_train.argmax(axis=1)
y_test = y_test.argmax(axis=1)
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
### Number of epochs run helps draw the charts
epochs = range(1, len(acc) + 1)
### Plot Accuracy
plt.figure()
plt.plot(epochs, acc, 'b', label="Training Accuracy")
plt.plot(epochs, val_acc, 'r', label="Validation Accuracy")
plt.title("Training/Validation Accuracy")
plt.legend(loc='lower right')
plt.xlabel("epochs")
plt.ylim(0, 1)
plt.figure()
### Plot Loss
plt.plot(epochs, loss, 'b', label="Training Loss")
plt.plot(epochs, val_loss, 'r', label="Validation Loss")
plt.title("Training/Validation Loss")
plt.legend(loc='upper right')
plt.xlabel("epochs")
plt.ylim(min(min(loss), min(val_loss), max(max(loss), max(val_loss))))
# Record classification results for Training Data
train_acc = round(accuracy_score(y_train, y_hat_train), ndigits=2)
train_rec = round(recall_score(y_train, y_hat_train, average='macro'), ndigits=2)
train_f1 = round(f1_score(y_train, y_hat_train, average='macro'), ndigits=2)
print("\n\nTraining Set Metrics: ")
print('Accuracy: ', train_acc)
print('Recall: ', train_rec)
print('F1: ', train_f1)
# Record classification results for Test Data
test_acc = round(accuracy_score(y_test, y_hat), ndigits=2)
test_rec = round(recall_score(y_test, y_hat, average='macro'), ndigits=2)
test_f1 = round(f1_score(y_test, y_hat, average='macro'), ndigits=2)
print("\nTest Set Metrics: ")
print('Accuracy: ', test_acc)
print('Recall: ', test_rec)
print('F1: ', test_f1)
print('\n')
# create the confusion matrix
conf_matrix = confusion_matrix(y_test, y_hat)
plt.figure()
# create the heatmap display
heatmap = sns.heatmap(conf_matrix, cmap='magma', annot=True, fmt='d');
heatmap.set_ylabel('actual')
heatmap.set_xlabel('prediction')
heatmap.plot()
### Clean up memory
gc.collect()
First I'm going to run a simple baseline model on the unprocessed images with no class balancing, then run the same model on the processed images to see if it's making a difference in learning. I'll then attempt an oversample of minority classes if it's not getting the results I want. THEN I'll see if I can improve the performance with more complexity.
To do that I'm going to have to binarize the labels here and then do a test/train split.
from sklearn.preprocessing import LabelBinarizer
binarizer = LabelBinarizer()
classes = binarizer.fit_transform(classes)
Import necessary Keras functionality, set options for the models.
Options I'm keeping consistent throughout modeling:
from keras import layers # provides access to the various layer types available
from keras import models
from keras import optimizers
from keras.callbacks import EarlyStopping # allows us to stop in the middle of learning if a pre-determined performance level is achieved
from keras.callbacks import ReduceLROnPlateau
from keras.callbacks import ModelCheckpoint
Create a function to run any passed model, re-create the train/test split with the same seed, fit the model, and provide metrics and results:
batch_size=64
validation_split=.3
epochs=1000
# loss is Categorical Crossentropy for multi-class classification
loss = tf.keras.losses.CategoricalCrossentropy()
# Using the Adamax optimizer with a coarse initial learning rate that will be automatically adjusted
optimizer = tf.keras.optimizers.Adam(learning_rate=.001)
# Implementing auto-learning-rate-reduction and early stopping callbacks that use the best weights achieved
lr = ReduceLROnPlateau(monitor='val_loss', patience=2, verbose=1, factor=0.9, min_lr=0.00001)
# Early Stopping: restore_best_weights means the model will use the best results instead of the last epoch
es = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
# Best weights: saving them to disk. (Not needed for this project due to best results saved above in EarlyStopping)
bw = ModelCheckpoint('best_weights.h5', monitor='val_acc', verbose=False, save_best_only=True, mode='max')
callbacks = [lr, es, bw]
from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator
def Run_Model(model, x, y):
# Create a holdout dataset to split into train/test, set 30% aside for final test
x_train, x_test, y_train, y_test = train_test_split(x, y, stratify=y, test_size=.3, random_state=seed)
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=validation_split, random_state=seed)
generator = ImageDataGenerator() # note I didn't use augmentation because it made the results WORSE
train_generator = generator.flow(x_train, y_train, batch_size=batch_size)
val_generator = generator.flow(x_val, y_val, batch_size=batch_size)
# compile the model
model.compile(loss=loss, optimizer=optimizer, metrics=['acc'])
# summarize the model
model.summary()
history = model.fit(train_generator,
steps_per_epoch=len(x_train) // batch_size,
epochs=epochs,
validation_data=val_generator,
validation_steps=len(x_val) // batch_size,
callbacks=callbacks)
show_metrics(model, x_train, x_test, y_train, y_test, history)
return history
Build the baseline model, creating a function to return it so that it can be easily re-used without all the retyping of layers. This will just be used to test performance before and after data processing to see if it's benefiting the model.
def baseline_model():
model = models.Sequential()
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same', input_shape=(128, 128, 3)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Dropout(.25))
model.add(layers.Flatten()) # drops the inputs into a single dimension
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(.5))
# output layer
model.add(layers.Dense(12, activation='softmax'))
return model
%%time
history_original_images = Run_Model(baseline_model(), images, classes)
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 128, 128, 32) 896 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 64, 64, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 64, 64, 32) 9248 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 32, 32, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32) 0 _________________________________________________________________ dropout (Dropout) (None, 16, 16, 32) 0 _________________________________________________________________ flatten (Flatten) (None, 8192) 0 _________________________________________________________________ dense (Dense) (None, 256) 2097408 _________________________________________________________________ dropout_1 (Dropout) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 12) 3084 ================================================================= Total params: 2,119,884 Trainable params: 2,119,884 Non-trainable params: 0 _________________________________________________________________ Epoch 1/1000 36/36 - 3s - loss: 6.0502 - acc: 0.1635 - val_loss: 1.9667 - val_acc: 0.3969 Epoch 2/1000 36/36 - 1s - loss: 1.6999 - acc: 0.4468 - val_loss: 1.4015 - val_acc: 0.5375 Epoch 3/1000 36/36 - 1s - loss: 1.3813 - acc: 0.5444 - val_loss: 1.2358 - val_acc: 0.5833 Epoch 4/1000 36/36 - 1s - loss: 1.1375 - acc: 0.6301 - val_loss: 1.0735 - val_acc: 0.6281 Epoch 5/1000 36/36 - 1s - loss: 0.9172 - acc: 0.6947 - val_loss: 0.9555 - val_acc: 0.6948 Epoch 6/1000 36/36 - 1s - loss: 0.7625 - acc: 0.7530 - val_loss: 0.9529 - val_acc: 0.6885 Epoch 7/1000 36/36 - 1s - loss: 0.6542 - acc: 0.7852 - val_loss: 0.9398 - val_acc: 0.6969 Epoch 8/1000 36/36 - 1s - loss: 0.5520 - acc: 0.8166 - val_loss: 0.9869 - val_acc: 0.6948 Epoch 9/1000 36/36 - 1s - loss: 0.4755 - acc: 0.8400 - val_loss: 0.8779 - val_acc: 0.7219 Epoch 10/1000 36/36 - 1s - loss: 0.3466 - acc: 0.8873 - val_loss: 0.9013 - val_acc: 0.7333 Epoch 11/1000 36/36 - 1s - loss: 0.3529 - acc: 0.8789 - val_loss: 1.0158 - val_acc: 0.6990 Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0009000000427477062. Epoch 12/1000 36/36 - 1s - loss: 0.3449 - acc: 0.8957 - val_loss: 0.9818 - val_acc: 0.7125 Epoch 13/1000 36/36 - 1s - loss: 0.2285 - acc: 0.9174 - val_loss: 0.9857 - val_acc: 0.7375 Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.0008100000384729356. Epoch 14/1000 36/36 - 1s - loss: 0.1926 - acc: 0.9315 - val_loss: 0.9896 - val_acc: 0.7281 Epoch 15/1000 36/36 - 1s - loss: 0.1712 - acc: 0.9430 - val_loss: 1.1114 - val_acc: 0.7406 Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.0007290000503417104. Epoch 16/1000 36/36 - 1s - loss: 0.1367 - acc: 0.9602 - val_loss: 1.1696 - val_acc: 0.7156 Epoch 17/1000 36/36 - 1s - loss: 0.1395 - acc: 0.9514 - val_loss: 1.1098 - val_acc: 0.7115 Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.0006561000715009868. Epoch 18/1000 36/36 - 1s - loss: 0.1315 - acc: 0.9611 - val_loss: 1.1245 - val_acc: 0.7302 Epoch 19/1000 36/36 - 1s - loss: 0.1072 - acc: 0.9629 - val_loss: 1.1076 - val_acc: 0.7427 Epoch 00019: ReduceLROnPlateau reducing learning rate to 0.0005904900433961303. Training Set Metrics: Accuracy: 0.97 Recall: 0.96 F1: 0.96 Test Set Metrics: Accuracy: 0.69 Recall: 0.62 F1: 0.62 Wall time: 16.6 s
So the model is overfitting as can be seen by its accuracy that isn't translating as well to the validation and test sets, but it's performing decently considering the images are completely unprocessed and the model is not complex. Many classes are being misclassified at this stage.
I'll now do some pre-processing on the images and see how it impacts the model. I'll use the same model for an apples to apples comparison.
CNNs work better on slightly blurred images, usually, because noise is smoothed out.
filtered_images = []
for image in images:
filtered_images.append(cv2.GaussianBlur(image, (3, 3), 0))
filtered_images = np.array(filtered_images)
print_firsts(filtered_images, text_classes)
This will make it easier to filter out objects and background that we don't want. It makes non-plant colors stick out.
hsv_images = []
for image in filtered_images:
hsv_images.append(cv2.cvtColor(image, cv2.COLOR_BGR2HSV))
hsv_images = np.array(hsv_images)
print_firsts(hsv_images, text_classes)
To bring out a little more detail in the average images.
mean_images = []
plt.figure(figsize=(20,15))
for i, block in enumerate(class_indices):
mean_image = np.mean(hsv_images[block[0]:block[1]] / 255, axis=0)
mean_images.append(mean_image)
plt.subplot(3, 4, i + 1)
ax = plt.gca()
ax.axes.xaxis.set_ticklabels([])
ax.axes.yaxis.set_ticklabels([])
plt.grid(b=None)
plt.imshow(mean_images[i])
plt.xlabel(unique_classes[i], fontsize=16)
plt.show()
You can definitely see a little more variety and defining features of the average images when converted to the HSV color range. Seedlings have definite cirvular patterns, some solid, some hollow. Many with bright centers.
Masking out objects in the scene that we don't need will help isolate the plant structure. I do this via the HSV colors. Essentially i'm just filtering out everything that isn't green.
masks = []
low = (25, 40, 50)
high = (75, 255, 255)
for image in hsv_images:
# create a mask based on RGB ranges to filter out (we can do this with all green plants)
# creates ellipses and draws them in on the desired color spectrum
mask = cv2.inRange(image, low, high)
masks.append(mask)
masks = np.array(masks)
print_firsts(masks, text_classes)
Now that I have masks created, I'm going to apply the masks to the denoised images from above to get us down to what we're interested in, the plants themselves.
masked_images = []
for i, mask in enumerate(masks):
# create the masks shape based off the high/low filter and apply to the denoised images
shape = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (13, 13))
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, shape)
# select everything that isn't black (masked out)
boolean_mask = mask > 0
# create the new image using the masks
masked_image = np.zeros_like(filtered_images[i], np.uint8)
masked_image[boolean_mask] = filtered_images[i][boolean_mask]
masked_images.append(masked_image)
masked_images = np.array(masked_images)
print_firsts(masked_images, text_classes)
Since pixels can only have values between 0-255, dividing by the max value normalizes the data by creating a proportion that falls within 0-1. This can help the learning algorithm.
Note that rather than normalizing the images this way, you can use a Gaussian filter within the model itself. There's a Keras layer just for doing so.
normalized_images = np.array(masked_images) / 255
This will be compared to the baseline model on the unprocessed images to see how much predictive performance has improved just by processing the images into a different format.
%%time
history_processed_images = Run_Model(baseline_model(), normalized_images, classes)
Model: "sequential_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_3 (Conv2D) (None, 128, 128, 32) 896 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 64, 64, 32) 0 _________________________________________________________________ conv2d_4 (Conv2D) (None, 64, 64, 32) 9248 _________________________________________________________________ max_pooling2d_4 (MaxPooling2 (None, 32, 32, 32) 0 _________________________________________________________________ conv2d_5 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d_5 (MaxPooling2 (None, 16, 16, 32) 0 _________________________________________________________________ dropout_2 (Dropout) (None, 16, 16, 32) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 8192) 0 _________________________________________________________________ dense_2 (Dense) (None, 256) 2097408 _________________________________________________________________ dropout_3 (Dropout) (None, 256) 0 _________________________________________________________________ dense_3 (Dense) (None, 12) 3084 ================================================================= Total params: 2,119,884 Trainable params: 2,119,884 Non-trainable params: 0 _________________________________________________________________ Epoch 1/1000 36/36 - 1s - loss: 2.0697 - acc: 0.2930 - val_loss: 1.6176 - val_acc: 0.4427 Epoch 2/1000 36/36 - 1s - loss: 1.4093 - acc: 0.5108 - val_loss: 1.2932 - val_acc: 0.5604 Epoch 3/1000 36/36 - 1s - loss: 1.0766 - acc: 0.6346 - val_loss: 1.0798 - val_acc: 0.6260 Epoch 4/1000 36/36 - 1s - loss: 0.7943 - acc: 0.7287 - val_loss: 0.9772 - val_acc: 0.6760 Epoch 5/1000 36/36 - 1s - loss: 0.6595 - acc: 0.7791 - val_loss: 0.9734 - val_acc: 0.6781 Epoch 6/1000 36/36 - 1s - loss: 0.5396 - acc: 0.8095 - val_loss: 0.9434 - val_acc: 0.6938 Epoch 7/1000 36/36 - 1s - loss: 0.4808 - acc: 0.8246 - val_loss: 0.9825 - val_acc: 0.6865 Epoch 8/1000 36/36 - 1s - loss: 0.3716 - acc: 0.8741 - val_loss: 0.9682 - val_acc: 0.6958 Epoch 00008: ReduceLROnPlateau reducing learning rate to 0.0005314410547725857. Epoch 9/1000 36/36 - 1s - loss: 0.2652 - acc: 0.9085 - val_loss: 1.0177 - val_acc: 0.7094 Epoch 10/1000 36/36 - 1s - loss: 0.2387 - acc: 0.9222 - val_loss: 1.0795 - val_acc: 0.7125 Epoch 00010: ReduceLROnPlateau reducing learning rate to 0.00047829695977270604. Epoch 11/1000 36/36 - 1s - loss: 0.2134 - acc: 0.9271 - val_loss: 1.1498 - val_acc: 0.7094 Epoch 12/1000 36/36 - 1s - loss: 0.1769 - acc: 0.9408 - val_loss: 1.1313 - val_acc: 0.7146 Epoch 00012: ReduceLROnPlateau reducing learning rate to 0.0004304672533180565. Epoch 13/1000 36/36 - 1s - loss: 0.1481 - acc: 0.9571 - val_loss: 1.1842 - val_acc: 0.7167 Epoch 14/1000 36/36 - 1s - loss: 0.1213 - acc: 0.9611 - val_loss: 1.2012 - val_acc: 0.7240 Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.00038742052274756136. Epoch 15/1000 36/36 - 1s - loss: 0.1050 - acc: 0.9629 - val_loss: 1.1991 - val_acc: 0.7208 Epoch 16/1000 36/36 - 1s - loss: 0.1245 - acc: 0.9607 - val_loss: 1.2480 - val_acc: 0.7198 Epoch 00016: ReduceLROnPlateau reducing learning rate to 0.0003486784757114947. Training Set Metrics: Accuracy: 0.92 Recall: 0.9 F1: 0.91 Test Set Metrics: Accuracy: 0.7 Recall: 0.63 F1: 0.63 Wall time: 11.8 s
The model is still misclassifying some, but it has improved from the unprocessed images considerably. This bodes well for future steps.
An imbalanced class situation can make it difficult for the algorithms to differentiate between classes, so I'm going to oversample the dataset.
I didn't get good results with SMOTE on images, so instead I'm using a basic Random Oversampler to resample with replacement and get all the classes balanced.
from imblearn.over_sampling import RandomOverSampler
images_reshaped = normalized_images.reshape(4750, 128 * 128 * 3)
images_reshaped.shape
(4750, 49152)
ros = RandomOverSampler(random_state=seed)
x_ros, y_ros = ros.fit_resample(images_reshaped, classes)
x_ros = x_ros.reshape(7848, 128, 128, 3)
plt.figure(figsize=(12,6))
plt.title("Class Balance of Seedlings", fontsize=16)
plt.bar(np.unique(binarizer.inverse_transform(y_ros), return_counts=True)[0], np.unique(binarizer.inverse_transform(y_ros), return_counts=True)[1])
plt.xticks(rotation=90, fontsize=16)
plt.show()
This will allow me to see what predictivce performance improvement I've achieved simply by oversampling the imbalanced classes.
%%time
history_ros = Run_Model(baseline_model(), x_ros, y_ros)
Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_6 (Conv2D) (None, 128, 128, 32) 896 _________________________________________________________________ max_pooling2d_6 (MaxPooling2 (None, 64, 64, 32) 0 _________________________________________________________________ conv2d_7 (Conv2D) (None, 64, 64, 32) 9248 _________________________________________________________________ max_pooling2d_7 (MaxPooling2 (None, 32, 32, 32) 0 _________________________________________________________________ conv2d_8 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d_8 (MaxPooling2 (None, 16, 16, 32) 0 _________________________________________________________________ dropout_4 (Dropout) (None, 16, 16, 32) 0 _________________________________________________________________ flatten_2 (Flatten) (None, 8192) 0 _________________________________________________________________ dense_4 (Dense) (None, 256) 2097408 _________________________________________________________________ dropout_5 (Dropout) (None, 256) 0 _________________________________________________________________ dense_5 (Dense) (None, 12) 3084 ================================================================= Total params: 2,119,884 Trainable params: 2,119,884 Non-trainable params: 0 _________________________________________________________________ Epoch 1/1000 60/60 - 2s - loss: 2.0174 - acc: 0.2722 - val_loss: 1.3572 - val_acc: 0.5381 Epoch 2/1000 60/60 - 1s - loss: 1.2084 - acc: 0.5726 - val_loss: 0.8813 - val_acc: 0.7006 Epoch 3/1000 60/60 - 1s - loss: 0.8338 - acc: 0.7070 - val_loss: 0.6900 - val_acc: 0.7744 Epoch 4/1000 60/60 - 1s - loss: 0.6263 - acc: 0.7829 - val_loss: 0.6334 - val_acc: 0.7981 Epoch 5/1000 60/60 - 1s - loss: 0.4818 - acc: 0.8350 - val_loss: 0.5724 - val_acc: 0.8313 Epoch 6/1000 60/60 - 1s - loss: 0.4087 - acc: 0.8606 - val_loss: 0.5137 - val_acc: 0.8344 Epoch 7/1000 60/60 - 1s - loss: 0.3112 - acc: 0.8939 - val_loss: 0.4917 - val_acc: 0.8438 Epoch 8/1000 60/60 - 1s - loss: 0.2705 - acc: 0.9103 - val_loss: 0.5071 - val_acc: 0.8512 Epoch 9/1000 60/60 - 1s - loss: 0.2208 - acc: 0.9238 - val_loss: 0.5379 - val_acc: 0.8537 Epoch 00009: ReduceLROnPlateau reducing learning rate to 0.00031381062290165574. Epoch 10/1000 60/60 - 1s - loss: 0.1812 - acc: 0.9405 - val_loss: 0.5408 - val_acc: 0.8519 Epoch 11/1000 60/60 - 1s - loss: 0.1539 - acc: 0.9540 - val_loss: 0.5622 - val_acc: 0.8450 Epoch 00011: ReduceLROnPlateau reducing learning rate to 0.0002824295632308349. Epoch 12/1000 60/60 - 1s - loss: 0.1487 - acc: 0.9516 - val_loss: 0.5639 - val_acc: 0.8594 Epoch 13/1000 60/60 - 1s - loss: 0.1255 - acc: 0.9611 - val_loss: 0.5110 - val_acc: 0.8644 Epoch 00013: ReduceLROnPlateau reducing learning rate to 0.00025418660952709616. Epoch 14/1000 60/60 - 1s - loss: 0.1087 - acc: 0.9630 - val_loss: 0.6314 - val_acc: 0.8450 Epoch 15/1000 60/60 - 1s - loss: 0.1035 - acc: 0.9683 - val_loss: 0.5322 - val_acc: 0.8606 Epoch 00015: ReduceLROnPlateau reducing learning rate to 0.00022876793809700757. Epoch 16/1000 60/60 - 1s - loss: 0.0942 - acc: 0.9685 - val_loss: 0.5486 - val_acc: 0.8581 Epoch 17/1000 60/60 - 1s - loss: 0.0779 - acc: 0.9765 - val_loss: 0.5727 - val_acc: 0.8481 Epoch 00017: ReduceLROnPlateau reducing learning rate to 0.00020589114428730683. Training Set Metrics: Accuracy: 0.98 Recall: 0.98 F1: 0.98 Test Set Metrics: Accuracy: 0.84 Recall: 0.84 F1: 0.84 Wall time: 20.7 s
The baseline model is actually performing really well on the Test set. There is some overtraining but I think this is almost as good as it's going to get with this many images in the dataset.
Key changes to this model are a change of the kernel size to 5 on the convolution laters and more dense neurons.
model = models.Sequential()
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same', input_shape=(128, 128, 3)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Conv2D(32, kernel_size=3, activation='relu', padding = 'same'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Dropout(.25))
model.add(layers.Flatten()) # drops the inputs into a single dimension
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(.5))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(.5))
# output layer
model.add(layers.Dense(12, activation='softmax'))
history_ros2 = Run_Model(model, x_ros, y_ros)
Model: "sequential_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_9 (Conv2D) (None, 128, 128, 32) 896 _________________________________________________________________ max_pooling2d_9 (MaxPooling2 (None, 64, 64, 32) 0 _________________________________________________________________ conv2d_10 (Conv2D) (None, 64, 64, 32) 9248 _________________________________________________________________ max_pooling2d_10 (MaxPooling (None, 32, 32, 32) 0 _________________________________________________________________ conv2d_11 (Conv2D) (None, 32, 32, 32) 9248 _________________________________________________________________ max_pooling2d_11 (MaxPooling (None, 16, 16, 32) 0 _________________________________________________________________ conv2d_12 (Conv2D) (None, 16, 16, 32) 9248 _________________________________________________________________ max_pooling2d_12 (MaxPooling (None, 8, 8, 32) 0 _________________________________________________________________ dropout_6 (Dropout) (None, 8, 8, 32) 0 _________________________________________________________________ flatten_3 (Flatten) (None, 2048) 0 _________________________________________________________________ dense_6 (Dense) (None, 256) 524544 _________________________________________________________________ dropout_7 (Dropout) (None, 256) 0 _________________________________________________________________ dense_7 (Dense) (None, 128) 32896 _________________________________________________________________ dropout_8 (Dropout) (None, 128) 0 _________________________________________________________________ dense_8 (Dense) (None, 12) 1548 ================================================================= Total params: 587,628 Trainable params: 587,628 Non-trainable params: 0 _________________________________________________________________ Epoch 1/1000 60/60 - 2s - loss: 2.3487 - acc: 0.1418 - val_loss: 1.9061 - val_acc: 0.3019 Epoch 2/1000 60/60 - 1s - loss: 1.8476 - acc: 0.3137 - val_loss: 1.5202 - val_acc: 0.4194 Epoch 3/1000 60/60 - 1s - loss: 1.5683 - acc: 0.4306 - val_loss: 1.1855 - val_acc: 0.5962 Epoch 4/1000 60/60 - 1s - loss: 1.3792 - acc: 0.4993 - val_loss: 1.0511 - val_acc: 0.6331 Epoch 5/1000 60/60 - 1s - loss: 1.3189 - acc: 0.5319 - val_loss: 1.0421 - val_acc: 0.6606 Epoch 6/1000 60/60 - 1s - loss: 1.2193 - acc: 0.5591 - val_loss: 0.9235 - val_acc: 0.6888 Epoch 7/1000 60/60 - 1s - loss: 1.1215 - acc: 0.5879 - val_loss: 0.8348 - val_acc: 0.7169 Epoch 8/1000 60/60 - 1s - loss: 1.0638 - acc: 0.6139 - val_loss: 0.8039 - val_acc: 0.7287 Epoch 9/1000 60/60 - 1s - loss: 0.9972 - acc: 0.6453 - val_loss: 0.7619 - val_acc: 0.7481 Epoch 10/1000 60/60 - 1s - loss: 0.9486 - acc: 0.6691 - val_loss: 0.7288 - val_acc: 0.7569 Epoch 11/1000 60/60 - 1s - loss: 0.8862 - acc: 0.6852 - val_loss: 0.6733 - val_acc: 0.7744 Epoch 12/1000 60/60 - 1s - loss: 0.8402 - acc: 0.6921 - val_loss: 0.6396 - val_acc: 0.7769 Epoch 13/1000 60/60 - 1s - loss: 0.8143 - acc: 0.7064 - val_loss: 0.6147 - val_acc: 0.7844 Epoch 14/1000 60/60 - 1s - loss: 0.7650 - acc: 0.7273 - val_loss: 0.5940 - val_acc: 0.7944 Epoch 15/1000 60/60 - 1s - loss: 0.7566 - acc: 0.7252 - val_loss: 0.5812 - val_acc: 0.8056 Epoch 16/1000 60/60 - 1s - loss: 0.7490 - acc: 0.7321 - val_loss: 0.5780 - val_acc: 0.7962 Epoch 17/1000 60/60 - 1s - loss: 0.6957 - acc: 0.7517 - val_loss: 0.5375 - val_acc: 0.8188 Epoch 18/1000 60/60 - 1s - loss: 0.6515 - acc: 0.7665 - val_loss: 0.5156 - val_acc: 0.8175 Epoch 19/1000 60/60 - 1s - loss: 0.6410 - acc: 0.7633 - val_loss: 0.5078 - val_acc: 0.8225 Epoch 20/1000 60/60 - 1s - loss: 0.6129 - acc: 0.7768 - val_loss: 0.4895 - val_acc: 0.8331 Epoch 21/1000 60/60 - 1s - loss: 0.5917 - acc: 0.7850 - val_loss: 0.4744 - val_acc: 0.8275 Epoch 22/1000 60/60 - 1s - loss: 0.5913 - acc: 0.7821 - val_loss: 0.5015 - val_acc: 0.8256 Epoch 23/1000 60/60 - 1s - loss: 0.5504 - acc: 0.8014 - val_loss: 0.4586 - val_acc: 0.8438 Epoch 24/1000 60/60 - 1s - loss: 0.5428 - acc: 0.8051 - val_loss: 0.4556 - val_acc: 0.8388 Epoch 25/1000 60/60 - 1s - loss: 0.5408 - acc: 0.8019 - val_loss: 0.4692 - val_acc: 0.8294 Epoch 26/1000 60/60 - 1s - loss: 0.5348 - acc: 0.8056 - val_loss: 0.4465 - val_acc: 0.8450 Epoch 27/1000 60/60 - 1s - loss: 0.4977 - acc: 0.8172 - val_loss: 0.4494 - val_acc: 0.8388 Epoch 28/1000 60/60 - 1s - loss: 0.5038 - acc: 0.8154 - val_loss: 0.4660 - val_acc: 0.8344 Epoch 00028: ReduceLROnPlateau reducing learning rate to 0.00018530203378759326. Epoch 29/1000 60/60 - 1s - loss: 0.5001 - acc: 0.8167 - val_loss: 0.4407 - val_acc: 0.8494 Epoch 30/1000 60/60 - 1s - loss: 0.4511 - acc: 0.8289 - val_loss: 0.4346 - val_acc: 0.8413 Epoch 31/1000 60/60 - 1s - loss: 0.4433 - acc: 0.8363 - val_loss: 0.4384 - val_acc: 0.8400 Epoch 32/1000 60/60 - 1s - loss: 0.4295 - acc: 0.8498 - val_loss: 0.4395 - val_acc: 0.8394 Epoch 00032: ReduceLROnPlateau reducing learning rate to 0.00016677183302817866. Epoch 33/1000 60/60 - 1s - loss: 0.4397 - acc: 0.8392 - val_loss: 0.4303 - val_acc: 0.8475 Epoch 34/1000 60/60 - 1s - loss: 0.4083 - acc: 0.8519 - val_loss: 0.4053 - val_acc: 0.8575 Epoch 35/1000 60/60 - 1s - loss: 0.4088 - acc: 0.8479 - val_loss: 0.3980 - val_acc: 0.8587 Epoch 36/1000 60/60 - 1s - loss: 0.4149 - acc: 0.8633 - val_loss: 0.4087 - val_acc: 0.8525 Epoch 37/1000 60/60 - 1s - loss: 0.3808 - acc: 0.8598 - val_loss: 0.3985 - val_acc: 0.8600 Epoch 00037: ReduceLROnPlateau reducing learning rate to 0.00015009464841568844. Epoch 38/1000 60/60 - 1s - loss: 0.3683 - acc: 0.8593 - val_loss: 0.4117 - val_acc: 0.8606 Epoch 39/1000 60/60 - 1s - loss: 0.4089 - acc: 0.8490 - val_loss: 0.4122 - val_acc: 0.8600 Epoch 00039: ReduceLROnPlateau reducing learning rate to 0.0001350851875031367. Epoch 40/1000 60/60 - 1s - loss: 0.3676 - acc: 0.8580 - val_loss: 0.4014 - val_acc: 0.8712 Epoch 41/1000 60/60 - 1s - loss: 0.3555 - acc: 0.8715 - val_loss: 0.3956 - val_acc: 0.8669 Epoch 42/1000 60/60 - 1s - loss: 0.3588 - acc: 0.8738 - val_loss: 0.4053 - val_acc: 0.8662 Epoch 43/1000 60/60 - 1s - loss: 0.3766 - acc: 0.8651 - val_loss: 0.4055 - val_acc: 0.8681 Epoch 00043: ReduceLROnPlateau reducing learning rate to 0.00012157666351413355. Epoch 44/1000 60/60 - 1s - loss: 0.3446 - acc: 0.8682 - val_loss: 0.4001 - val_acc: 0.8694 Epoch 45/1000 60/60 - 1s - loss: 0.3672 - acc: 0.8712 - val_loss: 0.4308 - val_acc: 0.8575 Epoch 00045: ReduceLROnPlateau reducing learning rate to 0.00010941899454337544. Epoch 46/1000 60/60 - 1s - loss: 0.3522 - acc: 0.8664 - val_loss: 0.3889 - val_acc: 0.8706 Epoch 47/1000 60/60 - 1s - loss: 0.3384 - acc: 0.8733 - val_loss: 0.3870 - val_acc: 0.8737 Epoch 48/1000 60/60 - 1s - loss: 0.3240 - acc: 0.8857 - val_loss: 0.3818 - val_acc: 0.8819 Epoch 49/1000 60/60 - 1s - loss: 0.3261 - acc: 0.8781 - val_loss: 0.4011 - val_acc: 0.8788 Epoch 50/1000 60/60 - 1s - loss: 0.3080 - acc: 0.8873 - val_loss: 0.4130 - val_acc: 0.8687 Epoch 00050: ReduceLROnPlateau reducing learning rate to 9.847709443420172e-05. Epoch 51/1000 60/60 - 1s - loss: 0.3130 - acc: 0.8887 - val_loss: 0.3791 - val_acc: 0.8800 Epoch 52/1000 60/60 - 1s - loss: 0.3164 - acc: 0.8797 - val_loss: 0.3929 - val_acc: 0.8731 Epoch 53/1000 60/60 - 1s - loss: 0.3137 - acc: 0.8865 - val_loss: 0.3845 - val_acc: 0.8813 Epoch 00053: ReduceLROnPlateau reducing learning rate to 8.862938630045391e-05. Epoch 54/1000 60/60 - 1s - loss: 0.2908 - acc: 0.8879 - val_loss: 0.3717 - val_acc: 0.8781 Epoch 55/1000 60/60 - 1s - loss: 0.2946 - acc: 0.8947 - val_loss: 0.3905 - val_acc: 0.8775 Epoch 56/1000 60/60 - 1s - loss: 0.3000 - acc: 0.8860 - val_loss: 0.3952 - val_acc: 0.8637 Epoch 00056: ReduceLROnPlateau reducing learning rate to 7.976644701557234e-05. Epoch 57/1000 60/60 - 1s - loss: 0.2921 - acc: 0.8855 - val_loss: 0.3940 - val_acc: 0.8788 Epoch 58/1000 60/60 - 1s - loss: 0.2978 - acc: 0.8881 - val_loss: 0.3860 - val_acc: 0.8781 Epoch 00058: ReduceLROnPlateau reducing learning rate to 7.178980231401511e-05. Epoch 59/1000 60/60 - 1s - loss: 0.3027 - acc: 0.8868 - val_loss: 0.3864 - val_acc: 0.8800 Epoch 60/1000 60/60 - 1s - loss: 0.2966 - acc: 0.8855 - val_loss: 0.3798 - val_acc: 0.8725 Epoch 00060: ReduceLROnPlateau reducing learning rate to 6.461082011810504e-05. Epoch 61/1000 60/60 - 1s - loss: 0.2900 - acc: 0.8913 - val_loss: 0.3842 - val_acc: 0.8769 Epoch 62/1000 60/60 - 1s - loss: 0.2681 - acc: 0.8995 - val_loss: 0.3890 - val_acc: 0.8806 Epoch 00062: ReduceLROnPlateau reducing learning rate to 5.8149741380475466e-05. Epoch 63/1000 60/60 - 1s - loss: 0.2806 - acc: 0.8908 - val_loss: 0.3853 - val_acc: 0.8769 Epoch 64/1000 60/60 - 1s - loss: 0.2803 - acc: 0.8939 - val_loss: 0.3761 - val_acc: 0.8775 Epoch 00064: ReduceLROnPlateau reducing learning rate to 5.233476658759173e-05. Training Set Metrics: Accuracy: 0.96 Recall: 0.96 F1: 0.96 Test Set Metrics: Accuracy: 0.87 Recall: 0.87 F1: 0.87
Although the test/val performance is lagging behind the training performance a bit still, the model is decently fit now. I suspect some of the inability to get higher performance is linked to us using a pared down dataset for this project.
Overall, this model provides roughly 86% accuracy/recall in identifying a seedling correctly.
It struggles in differentiating between seedling 0 and 6 (Black Grass and Loose Silky Bent), most likely because they have very basic profiles and no extremely defining characteristics between the two. If the end goal is to identify seedlings vs. grasses then this may not be a problem.
Other things I tried that were NOT successful were using the ImageDataGenerator's options to rotate, zoom, and otherwise vary images in order to create more avenues for the algorithm to pick up on. Gaussian Blurring was also not seemingly helping the model, but I left it in due to project requirements.
I think a key takeaway here is the importance of properly preparing and processing images prior to building a model, particularly if you're getting poor results to begin with. The largest jumps in performance were definitely image preparation and oversampling.